Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 800 |
| Missing cells | 13 |
| Missing cells (%) | 0.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 157.3 KiB |
| Average record size in memory | 201.3 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 9 |
| Boolean | 2 |
Name has a high cardinality: 799 distinct values | High cardinality |
Checkup is highly overall correlated with Disease | High correlation |
Disease is highly overall correlated with Checkup and 3 other fields | High correlation |
Height is highly overall correlated with Weight | High correlation |
Weight is highly overall correlated with Height | High correlation |
Diabetes is highly overall correlated with Exercise and 1 other fields | High correlation |
Mental_Health is highly overall correlated with Disease | High correlation |
Physical_Health is highly overall correlated with Drinking_Habit | High correlation |
Exercise is highly overall correlated with Diabetes and 1 other fields | High correlation |
Drinking_Habit is highly overall correlated with Physical_Health | High correlation |
Education has 13 (1.6%) missing values | Missing |
Name is uniformly distributed | Uniform |
PatientID has unique values | Unique |
Physical_Health has 311 (38.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-12-17 15:54:27.783062 |
|---|---|
| Analysis finished | 2022-12-17 15:54:33.497049 |
| Duration | 5.71 seconds |
| Software version | pandas-profiling vdev |
| Download configuration | config.json |
PatientID
Real number (ℝ)
| Distinct | 800 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1513.9987 |
| Minimum | 1001 |
|---|---|
| Maximum | 2024 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 44.8 KiB |
Quantile statistics
| Minimum | 1001 |
|---|---|
| 5-th percentile | 1050.95 |
| Q1 | 1247.5 |
| median | 1519.5 |
| Q3 | 1777.25 |
| 95-th percentile | 1979.05 |
| Maximum | 2024 |
| Range | 1023 |
| Interquartile range (IQR) | 529.75 |
Descriptive statistics
| Standard deviation | 300.87463 |
|---|---|
| Coefficient of variation (CV) | 0.19872845 |
| Kurtosis | -1.235007 |
| Mean | 1513.9987 |
| Median Absolute Deviation (MAD) | 265 |
| Skewness | -0.0063300739 |
| Sum | 1211199 |
| Variance | 90525.543 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1167 | 1 | 0.1% |
| 1879 | 1 | 0.1% |
| 1780 | 1 | 0.1% |
| 1237 | 1 | 0.1% |
| 1656 | 1 | 0.1% |
| 1222 | 1 | 0.1% |
| 1543 | 1 | 0.1% |
| 2015 | 1 | 0.1% |
| 1845 | 1 | 0.1% |
| 1855 | 1 | 0.1% |
| Other values (790) | 790 |
| Value | Count | Frequency (%) |
| 1001 | 1 | |
| 1003 | 1 | |
| 1004 | 1 | |
| 1005 | 1 | |
| 1006 | 1 | |
| 1008 | 1 | |
| 1009 | 1 | |
| 1010 | 1 | |
| 1011 | 1 | |
| 1012 | 1 |
| Value | Count | Frequency (%) |
| 2024 | 1 | |
| 2023 | 1 | |
| 2022 | 1 | |
| 2020 | 1 | |
| 2019 | 1 | |
| 2018 | 1 | |
| 2017 | 1 | |
| 2016 | 1 | |
| 2015 | 1 | |
| 2014 | 1 |
| Distinct | 799 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 44.8 KiB |
| Mr. Gary Miller | 2 |
|---|---|
| Mrs. Stephanie Gay | 1 |
| Mr. Roger Rudd | 1 |
| Mr. Vito Ertz | 1 |
| Mrs. Marilyn Miller | 1 |
| Other values (794) |
Length
| Max length | 25 |
|---|---|
| Median length | 23 |
| Mean length | 17.39375 |
| Min length | 11 |
Characters and Unicode
| Total characters | 13915 |
|---|---|
| Distinct characters | 54 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 798 ? |
|---|---|
| Unique (%) | 99.8% |
Sample
| 1st row | Mrs. Stephanie Gay |
|---|---|
| 2nd row | Mr. Sherman Nero |
| 3rd row | Mr. Mark Boller |
| 4th row | Mr. David Caffee |
| 5th row | Mr. Gerald Emery |
Common Values
| Value | Count | Frequency (%) |
| Mr. Gary Miller | 2 | 0.2% |
| Mrs. Stephanie Gay | 1 | 0.1% |
| Mr. Roger Rudd | 1 | 0.1% |
| Mr. Vito Ertz | 1 | 0.1% |
| Mrs. Marilyn Miller | 1 | 0.1% |
| Mr. David Hench | 1 | 0.1% |
| Mr. Blair Simmons | 1 | 0.1% |
| Mr. James Luna | 1 | 0.1% |
| Mr. Irwin Mcclure | 1 | 0.1% |
| Mr. Todd Doster | 1 | 0.1% |
| Other values (789) | 789 |
Length
| Value | Count | Frequency (%) |
| mr | 564 | 23.5% |
| mrs | 236 | 9.8% |
| michael | 23 | 1.0% |
| james | 22 | 0.9% |
| robert | 20 | 0.8% |
| john | 19 | 0.8% |
| david | 16 | 0.7% |
| richard | 15 | 0.6% |
| william | 12 | 0.5% |
| timothy | 11 | 0.5% |
| Other values (1028) | 1462 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1600 | 11.5% | |
| r | 1556 | 11.2% |
| e | 983 | 7.1% |
| M | 973 | 7.0% |
| a | 925 | 6.6% |
| . | 800 | 5.7% |
| n | 718 | 5.2% |
| s | 645 | 4.6% |
| i | 616 | 4.4% |
| o | 612 | 4.4% |
| Other values (44) | 4487 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 9115 | |
| Uppercase Letter | 2400 | 17.2% |
| Space Separator | 1600 | 11.5% |
| Other Punctuation | 800 | 5.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 1556 | |
| e | 983 | |
| a | 925 | |
| n | 718 | |
| s | 645 | 7.1% |
| i | 616 | 6.8% |
| o | 612 | 6.7% |
| l | 607 | 6.7% |
| t | 353 | 3.9% |
| h | 319 | 3.5% |
| Other values (16) | 1781 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 973 | |
| J | 136 | 5.7% |
| C | 125 | 5.2% |
| R | 123 | 5.1% |
| B | 103 | 4.3% |
| S | 101 | 4.2% |
| D | 94 | 3.9% |
| L | 86 | 3.6% |
| A | 75 | 3.1% |
| W | 74 | 3.1% |
| Other values (16) | 510 |
Space Separator
| Value | Count | Frequency (%) |
| 1600 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 800 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 11515 | |
| Common | 2400 | 17.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| r | 1556 | |
| e | 983 | 8.5% |
| M | 973 | 8.4% |
| a | 925 | 8.0% |
| n | 718 | 6.2% |
| s | 645 | 5.6% |
| i | 616 | 5.3% |
| o | 612 | 5.3% |
| l | 607 | 5.3% |
| t | 353 | 3.1% |
| Other values (42) | 3527 |
Common
| Value | Count | Frequency (%) |
| 1600 | ||
| . | 800 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13915 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1600 | 11.5% | |
| r | 1556 | 11.2% |
| e | 983 | 7.1% |
| M | 973 | 7.0% |
| a | 925 | 6.6% |
| . | 800 | 5.7% |
| n | 718 | 5.2% |
| s | 645 | 4.6% |
| i | 616 | 4.4% |
| o | 612 | 4.4% |
| Other values (44) | 4487 |
Birth_Year
Real number (ℝ)
| Distinct | 50 |
|---|---|
| Distinct (%) | 6.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1966.0438 |
| Minimum | 1855 |
|---|---|
| Maximum | 1993 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 44.8 KiB |
Quantile statistics
| Minimum | 1855 |
|---|---|
| 5-th percentile | 1953 |
| Q1 | 1961 |
| median | 1966 |
| Q3 | 1974 |
| 95-th percentile | 1982.05 |
| Maximum | 1993 |
| Range | 138 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 15.421872 |
|---|---|
| Coefficient of variation (CV) | 0.0078441142 |
| Kurtosis | 26.559098 |
| Mean | 1966.0438 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | -4.2088125 |
| Sum | 1572835 |
| Variance | 237.83413 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1964 | 57 | 7.1% |
| 1965 | 47 | 5.9% |
| 1963 | 38 | 4.8% |
| 1968 | 37 | 4.6% |
| 1970 | 31 | 3.9% |
| 1966 | 31 | 3.9% |
| 1971 | 31 | 3.9% |
| 1960 | 29 | 3.6% |
| 1962 | 29 | 3.6% |
| 1958 | 27 | 3.4% |
| Other values (40) | 443 |
| Value | Count | Frequency (%) |
| 1855 | 1 | 0.1% |
| 1859 | 3 | |
| 1860 | 1 | 0.1% |
| 1864 | 2 | |
| 1866 | 1 | 0.1% |
| 1867 | 1 | 0.1% |
| 1869 | 1 | 0.1% |
| 1870 | 1 | 0.1% |
| 1881 | 1 | 0.1% |
| 1945 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 1993 | 4 | 0.5% |
| 1988 | 4 | 0.5% |
| 1987 | 12 | |
| 1985 | 2 | 0.2% |
| 1984 | 9 | 1.1% |
| 1983 | 9 | 1.1% |
| 1982 | 7 | 0.9% |
| 1981 | 25 | |
| 1980 | 22 | |
| 1979 | 22 |
Region
Categorical
| Distinct | 10 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 44.8 KiB |
| East Midlands | |
|---|---|
| London | |
| South West | |
| West Midlands | |
| South East | |
| Other values (5) |
Length
| Max length | 24 |
|---|---|
| Median length | 15 |
| Mean length | 11.82625 |
| Min length | 6 |
Characters and Unicode
| Total characters | 9461 |
|---|---|
| Distinct characters | 28 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | London |
|---|---|
| 2nd row | South West |
| 3rd row | Yorkshire and the Humber |
| 4th row | London |
| 5th row | South East |
Common Values
| Value | Count | Frequency (%) |
| East Midlands | 154 | |
| London | 136 | |
| South West | 107 | |
| West Midlands | 89 | |
| South East | 84 | |
| East of England | 80 | |
| Yorkshire and the Humber | 64 | |
| North West | 59 | 7.4% |
| North East | 22 | 2.8% |
| LONDON | 5 | 0.6% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| east | 340 | |
| west | 255 | |
| midlands | 243 | |
| south | 191 | |
| london | 141 | |
| north | 81 | 4.9% |
| of | 80 | 4.8% |
| england | 80 | 4.8% |
| yorkshire | 64 | 3.8% |
| and | 64 | 3.8% |
| Other values (2) | 128 | 7.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 931 | 9.8% |
| s | 902 | 9.5% |
| 867 | 9.2% | |
| d | 766 | 8.1% |
| n | 739 | 7.8% |
| a | 727 | 7.7% |
| o | 688 | 7.3% |
| e | 447 | 4.7% |
| E | 420 | 4.4% |
| h | 400 | 4.2% |
| Other values (18) | 2574 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7110 | |
| Uppercase Letter | 1484 | 15.7% |
| Space Separator | 867 | 9.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 931 | |
| s | 902 | |
| d | 766 | |
| n | 739 | |
| a | 727 | |
| o | 688 | |
| e | 447 | |
| h | 400 | |
| l | 323 | 4.5% |
| i | 307 | 4.3% |
| Other values (7) | 880 |
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 420 | |
| W | 255 | |
| M | 243 | |
| S | 191 | |
| L | 141 | 9.5% |
| N | 91 | 6.1% |
| Y | 64 | 4.3% |
| H | 64 | 4.3% |
| O | 10 | 0.7% |
| D | 5 | 0.3% |
Space Separator
| Value | Count | Frequency (%) |
| 867 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 8594 | |
| Common | 867 | 9.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| t | 931 | |
| s | 902 | |
| d | 766 | 8.9% |
| n | 739 | 8.6% |
| a | 727 | 8.5% |
| o | 688 | 8.0% |
| e | 447 | 5.2% |
| E | 420 | 4.9% |
| h | 400 | 4.7% |
| l | 323 | 3.8% |
| Other values (17) | 2251 |
Common
| Value | Count | Frequency (%) |
| 867 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 9461 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| t | 931 | 9.8% |
| s | 902 | 9.5% |
| 867 | 9.2% | |
| d | 766 | 8.1% |
| n | 739 | 7.8% |
| a | 727 | 7.7% |
| o | 688 | 7.3% |
| e | 447 | 4.7% |
| E | 420 | 4.4% |
| h | 400 | 4.2% |
| Other values (18) | 2574 |
Education
Categorical
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 13 |
| Missing (%) | 1.6% |
| Memory size | 44.8 KiB |
| University Complete (3 or more years) | |
|---|---|
| High School Graduate | |
| Elementary School (1st to 9th grade) | |
| High School Incomplete (10th to 11th grade) | |
| University Incomplete (1 to 2 years) |
Length
| Max length | 43 |
|---|---|
| Median length | 37 |
| Mean length | 33.035578 |
| Min length | 20 |
Characters and Unicode
| Total characters | 25999 |
|---|---|
| Distinct characters | 35 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | High School Incomplete (10th to 11th grade) |
|---|---|
| 2nd row | High School Incomplete (10th to 11th grade) |
| 3rd row | Elementary School (1st to 9th grade) |
| 4th row | University Complete (3 or more years) |
| 5th row | University Incomplete (1 to 2 years) |
Common Values
| Value | Count | Frequency (%) |
| University Complete (3 or more years) | 239 | |
| High School Graduate | 196 | |
| Elementary School (1st to 9th grade) | 183 | |
| High School Incomplete (10th to 11th grade) | 102 | |
| University Incomplete (1 to 2 years) | 37 | 4.6% |
| I never attended school / Other | 30 | 3.8% |
| (Missing) | 13 | 1.6% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| school | 511 | 12.1% |
| to | 322 | 7.6% |
| high | 298 | 7.0% |
| grade | 285 | 6.7% |
| university | 276 | 6.5% |
| years | 276 | 6.5% |
| 3 | 239 | 5.6% |
| or | 239 | 5.6% |
| more | 239 | 5.6% |
| complete | 239 | 5.6% |
| Other values (14) | 1312 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3449 | 13.3% | |
| e | 2544 | 9.8% |
| o | 2200 | 8.5% |
| t | 2015 | 7.8% |
| r | 1754 | 6.7% |
| h | 1226 | 4.7% |
| a | 1166 | 4.5% |
| l | 1072 | 4.1% |
| i | 850 | 3.3% |
| m | 800 | 3.1% |
| Other values (25) | 8923 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 18439 | |
| Space Separator | 3449 | 13.3% |
| Uppercase Letter | 1872 | 7.2% |
| Decimal Number | 1087 | 4.2% |
| Open Punctuation | 561 | 2.2% |
| Close Punctuation | 561 | 2.2% |
| Other Punctuation | 30 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 2544 | |
| o | 2200 | |
| t | 2015 | |
| r | 1754 | |
| h | 1226 | 6.6% |
| a | 1166 | 6.3% |
| l | 1072 | 5.8% |
| i | 850 | 4.6% |
| m | 800 | 4.3% |
| s | 765 | 4.1% |
| Other values (8) | 4047 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 481 | |
| H | 298 | |
| U | 276 | |
| C | 239 | |
| G | 196 | |
| E | 183 | 9.8% |
| I | 169 | 9.0% |
| O | 30 | 1.6% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 526 | |
| 3 | 239 | |
| 9 | 183 | 16.8% |
| 0 | 102 | 9.4% |
| 2 | 37 | 3.4% |
Space Separator
| Value | Count | Frequency (%) |
| 3449 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 561 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 561 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 30 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 20311 | |
| Common | 5688 | 21.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 2544 | |
| o | 2200 | 10.8% |
| t | 2015 | 9.9% |
| r | 1754 | 8.6% |
| h | 1226 | 6.0% |
| a | 1166 | 5.7% |
| l | 1072 | 5.3% |
| i | 850 | 4.2% |
| m | 800 | 3.9% |
| s | 765 | 3.8% |
| Other values (16) | 5919 |
Common
| Value | Count | Frequency (%) |
| 3449 | ||
| ( | 561 | 9.9% |
| ) | 561 | 9.9% |
| 1 | 526 | 9.2% |
| 3 | 239 | 4.2% |
| 9 | 183 | 3.2% |
| 0 | 102 | 1.8% |
| 2 | 37 | 0.7% |
| / | 30 | 0.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 25999 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3449 | 13.3% | |
| e | 2544 | 9.8% |
| o | 2200 | 8.5% |
| t | 2015 | 7.8% |
| r | 1754 | 6.7% |
| h | 1226 | 4.7% |
| a | 1166 | 4.5% |
| l | 1072 | 4.1% |
| i | 850 | 3.3% |
| m | 800 | 3.1% |
| Other values (25) | 8923 |
Height
Real number (ℝ)
| Distinct | 15 |
|---|---|
| Distinct (%) | 1.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 167.80625 |
| Minimum | 151 |
|---|---|
| Maximum | 180 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 44.8 KiB |
Quantile statistics
| Minimum | 151 |
|---|---|
| 5-th percentile | 154 |
| Q1 | 162 |
| median | 167 |
| Q3 | 173 |
| 95-th percentile | 180 |
| Maximum | 180 |
| Range | 29 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 7.9768885 |
|---|---|
| Coefficient of variation (CV) | 0.047536301 |
| Kurtosis | -0.88631625 |
| Mean | 167.80625 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | -0.33443479 |
| Sum | 134245 |
| Variance | 63.630749 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 167 | 98 | |
| 172 | 81 | |
| 178 | 74 | |
| 162 | 69 | |
| 174 | 66 | |
| 173 | 61 | |
| 180 | 57 | |
| 171 | 56 | 7.0% |
| 157 | 51 | 6.4% |
| 165 | 45 | 5.6% |
| Other values (5) | 142 |
| Value | Count | Frequency (%) |
| 151 | 21 | 2.6% |
| 154 | 26 | 3.2% |
| 155 | 31 | 3.9% |
| 157 | 51 | |
| 158 | 39 | 4.9% |
| 162 | 69 | |
| 165 | 45 | |
| 166 | 25 | 3.1% |
| 167 | 98 | |
| 171 | 56 |
| Value | Count | Frequency (%) |
| 180 | 57 | |
| 178 | 74 | |
| 174 | 66 | |
| 173 | 61 | |
| 172 | 81 | |
| 171 | 56 | |
| 167 | 98 | |
| 166 | 25 | 3.1% |
| 165 | 45 | |
| 162 | 69 |
Weight
Real number (ℝ)
| Distinct | 56 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 67.8275 |
| Minimum | 40 |
|---|---|
| Maximum | 97 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 44.8 KiB |
Quantile statistics
| Minimum | 40 |
|---|---|
| 5-th percentile | 49 |
| Q1 | 58 |
| median | 68 |
| Q3 | 77 |
| 95-th percentile | 88 |
| Maximum | 97 |
| Range | 57 |
| Interquartile range (IQR) | 19 |
Descriptive statistics
| Standard deviation | 12.11347 |
|---|---|
| Coefficient of variation (CV) | 0.17859232 |
| Kurtosis | -0.72390191 |
| Mean | 67.8275 |
| Median Absolute Deviation (MAD) | 9 |
| Skewness | 0.12617779 |
| Sum | 54262 |
| Variance | 146.73616 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 70 | 32 | 4.0% |
| 59 | 29 | 3.6% |
| 72 | 26 | 3.2% |
| 71 | 26 | 3.2% |
| 61 | 26 | 3.2% |
| 67 | 24 | 3.0% |
| 56 | 24 | 3.0% |
| 55 | 23 | 2.9% |
| 69 | 23 | 2.9% |
| 76 | 22 | 2.8% |
| Other values (46) | 545 |
| Value | Count | Frequency (%) |
| 40 | 1 | 0.1% |
| 41 | 2 | 0.2% |
| 42 | 1 | 0.1% |
| 44 | 1 | 0.1% |
| 45 | 10 | |
| 46 | 5 | 0.6% |
| 47 | 8 | |
| 48 | 8 | |
| 49 | 9 | |
| 50 | 13 |
| Value | Count | Frequency (%) |
| 97 | 3 | 0.4% |
| 96 | 4 | 0.5% |
| 95 | 2 | 0.2% |
| 94 | 1 | 0.1% |
| 93 | 4 | 0.5% |
| 92 | 4 | 0.5% |
| 90 | 6 | |
| 89 | 9 | |
| 88 | 12 | |
| 87 | 14 |
Checkup
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 44.8 KiB |
| More than 3 years | |
|---|---|
| Not sure | |
| Less than 3 years but more than 1 year | |
| Less than three months | 6 |
Length
| Max length | 38 |
|---|---|
| Median length | 17 |
| Mean length | 14.91875 |
| Min length | 8 |
Characters and Unicode
| Total characters | 11935 |
|---|---|
| Distinct characters | 18 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | More than 3 years |
|---|---|
| 2nd row | Not sure |
| 3rd row | More than 3 years |
| 4th row | Not sure |
| 5th row | More than 3 years |
Common Values
| Value | Count | Frequency (%) |
| More than 3 years | 429 | |
| Not sure | 312 | |
| Less than 3 years but more than 1 year | 53 | 6.6% |
| Less than three months | 6 | 0.8% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| than | 541 | |
| more | 482 | |
| 3 | 482 | |
| years | 482 | |
| not | 312 | |
| sure | 312 | |
| less | 59 | 2.1% |
| but | 53 | 1.9% |
| 1 | 53 | 1.9% |
| year | 53 | 1.9% |
| Other values (2) | 12 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2041 | ||
| e | 1400 | |
| r | 1335 | |
| a | 1076 | |
| s | 918 | |
| t | 918 | |
| o | 800 | 6.7% |
| h | 553 | 4.6% |
| n | 547 | 4.6% |
| y | 535 | 4.5% |
| Other values (8) | 1812 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 8559 | |
| Space Separator | 2041 | 17.1% |
| Uppercase Letter | 800 | 6.7% |
| Decimal Number | 535 | 4.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1400 | |
| r | 1335 | |
| a | 1076 | |
| s | 918 | |
| t | 918 | |
| o | 800 | |
| h | 553 | 6.5% |
| n | 547 | 6.4% |
| y | 535 | 6.3% |
| u | 365 | 4.3% |
| Other values (2) | 112 | 1.3% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 429 | |
| N | 312 | |
| L | 59 | 7.4% |
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 482 | |
| 1 | 53 | 9.9% |
Space Separator
| Value | Count | Frequency (%) |
| 2041 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9359 | |
| Common | 2576 | 21.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1400 | |
| r | 1335 | |
| a | 1076 | |
| s | 918 | |
| t | 918 | |
| o | 800 | |
| h | 553 | 5.9% |
| n | 547 | 5.8% |
| y | 535 | 5.7% |
| M | 429 | 4.6% |
| Other values (5) | 848 |
Common
| Value | Count | Frequency (%) |
| 2041 | ||
| 3 | 482 | 18.7% |
| 1 | 53 | 2.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11935 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2041 | ||
| e | 1400 | |
| r | 1335 | |
| a | 1076 | |
| s | 918 | |
| t | 918 | |
| o | 800 | 6.7% |
| h | 553 | 4.6% |
| n | 547 | 4.6% |
| y | 535 | 4.5% |
| Other values (8) | 1812 |
Diabetes
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 44.8 KiB |
| Neither I nor my immediate family have diabetes. | |
|---|---|
| I have/had pregnancy diabetes or borderline diabetes | |
| I do have diabetes | |
| I don't have diabetes, but I have direct family members who have diabetes. |
Length
| Max length | 74 |
|---|---|
| Median length | 52 |
| Mean length | 45.515 |
| Min length | 18 |
Characters and Unicode
| Total characters | 36412 |
|---|---|
| Distinct characters | 28 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Neither I nor my immediate family have diabetes. |
|---|---|
| 2nd row | Neither I nor my immediate family have diabetes. |
| 3rd row | Neither I nor my immediate family have diabetes. |
| 4th row | I have/had pregnancy diabetes or borderline diabetes |
| 5th row | I have/had pregnancy diabetes or borderline diabetes |
Common Values
| Value | Count | Frequency (%) |
| Neither I nor my immediate family have diabetes. | 392 | |
| I have/had pregnancy diabetes or borderline diabetes | 206 | |
| I do have diabetes | 144 | 18.0% |
| I don't have diabetes, but I have direct family members who have diabetes. | 58 | 7.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| diabetes | 1064 | |
| i | 858 | |
| have | 710 | |
| family | 450 | |
| neither | 392 | 6.6% |
| nor | 392 | 6.6% |
| my | 392 | 6.6% |
| immediate | 392 | 6.6% |
| or | 206 | 3.5% |
| borderline | 206 | 3.5% |
| Other values (8) | 846 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 5404 | |
| 5108 | ||
| a | 3234 | 8.9% |
| i | 2954 | 8.1% |
| d | 2128 | 5.8% |
| t | 2022 | 5.6% |
| m | 1742 | 4.8% |
| r | 1724 | 4.7% |
| h | 1572 | 4.3% |
| b | 1386 | 3.8% |
| Other values (18) | 9138 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 29282 | |
| Space Separator | 5108 | 14.0% |
| Uppercase Letter | 1250 | 3.4% |
| Other Punctuation | 772 | 2.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 5404 | |
| a | 3234 | |
| i | 2954 | |
| d | 2128 | 7.3% |
| t | 2022 | 6.9% |
| m | 1742 | 5.9% |
| r | 1724 | 5.9% |
| h | 1572 | 5.4% |
| b | 1386 | 4.7% |
| s | 1122 | 3.8% |
| Other values (11) | 5994 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 450 | |
| / | 206 | |
| ' | 58 | 7.5% |
| , | 58 | 7.5% |
Uppercase Letter
| Value | Count | Frequency (%) |
| I | 858 | |
| N | 392 |
Space Separator
| Value | Count | Frequency (%) |
| 5108 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 30532 | |
| Common | 5880 | 16.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 5404 | |
| a | 3234 | |
| i | 2954 | |
| d | 2128 | 7.0% |
| t | 2022 | 6.6% |
| m | 1742 | 5.7% |
| r | 1724 | 5.6% |
| h | 1572 | 5.1% |
| b | 1386 | 4.5% |
| s | 1122 | 3.7% |
| Other values (13) | 7244 |
Common
| Value | Count | Frequency (%) |
| 5108 | ||
| . | 450 | 7.7% |
| / | 206 | 3.5% |
| ' | 58 | 1.0% |
| , | 58 | 1.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 36412 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 5404 | |
| 5108 | ||
| a | 3234 | 8.9% |
| i | 2954 | 8.1% |
| d | 2128 | 5.8% |
| t | 2022 | 5.6% |
| m | 1742 | 4.8% |
| r | 1724 | 4.7% |
| h | 1572 | 4.3% |
| b | 1386 | 3.8% |
| Other values (18) | 9138 |
High_Cholesterol
Real number (ℝ)
| Distinct | 150 |
|---|---|
| Distinct (%) | 18.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 249.3225 |
| Minimum | 130 |
|---|---|
| Maximum | 568 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 44.8 KiB |
Quantile statistics
| Minimum | 130 |
|---|---|
| 5-th percentile | 179 |
| Q1 | 213.75 |
| median | 244 |
| Q3 | 280 |
| 95-th percentile | 329.05 |
| Maximum | 568 |
| Range | 438 |
| Interquartile range (IQR) | 66.25 |
Descriptive statistics
| Standard deviation | 51.566631 |
|---|---|
| Coefficient of variation (CV) | 0.20682702 |
| Kurtosis | 4.9793639 |
| Mean | 249.3225 |
| Median Absolute Deviation (MAD) | 33 |
| Skewness | 1.1678961 |
| Sum | 199458 |
| Variance | 2659.1174 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 201 | 17 | 2.1% |
| 238 | 16 | 2.0% |
| 208 | 16 | 2.0% |
| 181 | 13 | 1.6% |
| 216 | 13 | 1.6% |
| 258 | 13 | 1.6% |
| 215 | 12 | 1.5% |
| 286 | 11 | 1.4% |
| 207 | 11 | 1.4% |
| 244 | 11 | 1.4% |
| Other values (140) | 667 |
| Value | Count | Frequency (%) |
| 130 | 3 | |
| 135 | 2 | 0.2% |
| 145 | 3 | |
| 153 | 6 | |
| 161 | 4 | |
| 164 | 3 | |
| 168 | 1 | 0.1% |
| 170 | 4 | |
| 171 | 3 | |
| 172 | 2 | 0.2% |
| Value | Count | Frequency (%) |
| 568 | 3 | |
| 421 | 2 | |
| 413 | 2 | |
| 411 | 2 | |
| 398 | 2 | |
| 358 | 3 | |
| 357 | 3 | |
| 346 | 4 | |
| 345 | 3 | |
| 344 | 2 |
Blood_Pressure
Real number (ℝ)
| Distinct | 49 |
|---|---|
| Distinct (%) | 6.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 131.05375 |
| Minimum | 94 |
|---|---|
| Maximum | 200 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 44.8 KiB |
Quantile statistics
| Minimum | 94 |
|---|---|
| 5-th percentile | 108 |
| Q1 | 120 |
| median | 130 |
| Q3 | 140 |
| 95-th percentile | 160 |
| Maximum | 200 |
| Range | 106 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 17.052693 |
|---|---|
| Coefficient of variation (CV) | 0.13011984 |
| Kurtosis | 1.233682 |
| Mean | 131.05375 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 0.78683803 |
| Sum | 104843 |
| Variance | 290.79435 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 120 | 111 | 13.9% |
| 130 | 95 | 11.9% |
| 140 | 86 | 10.8% |
| 110 | 50 | 6.2% |
| 150 | 42 | 5.2% |
| 138 | 33 | 4.1% |
| 128 | 33 | 4.1% |
| 125 | 26 | 3.2% |
| 132 | 23 | 2.9% |
| 112 | 23 | 2.9% |
| Other values (39) | 278 |
| Value | Count | Frequency (%) |
| 94 | 3 | 0.4% |
| 100 | 13 | 1.6% |
| 101 | 3 | 0.4% |
| 102 | 5 | 0.6% |
| 104 | 2 | 0.2% |
| 105 | 7 | 0.9% |
| 106 | 3 | 0.4% |
| 108 | 15 | 1.9% |
| 110 | 50 | |
| 112 | 23 |
| Value | Count | Frequency (%) |
| 200 | 3 | 0.4% |
| 192 | 3 | 0.4% |
| 180 | 5 | 0.6% |
| 178 | 5 | 0.6% |
| 174 | 2 | 0.2% |
| 172 | 1 | 0.1% |
| 170 | 12 | |
| 165 | 4 | 0.5% |
| 164 | 3 | 0.4% |
| 160 | 20 |
Mental_Health
Real number (ℝ)
| Distinct | 28 |
|---|---|
| Distinct (%) | 3.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 17.345 |
| Minimum | 0 |
|---|---|
| Maximum | 29 |
| Zeros | 4 |
| Zeros (%) | 0.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 44.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 13 |
| median | 18 |
| Q3 | 21 |
| 95-th percentile | 25 |
| Maximum | 29 |
| Range | 29 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 5.3851392 |
|---|---|
| Coefficient of variation (CV) | 0.31047214 |
| Kurtosis | -0.14623242 |
| Mean | 17.345 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -0.51235392 |
| Sum | 13876 |
| Variance | 28.999725 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 20 | 81 | 10.1% |
| 16 | 68 | 8.5% |
| 19 | 65 | 8.1% |
| 23 | 59 | 7.4% |
| 18 | 56 | 7.0% |
| 21 | 52 | 6.5% |
| 13 | 49 | 6.1% |
| 22 | 49 | 6.1% |
| 17 | 36 | 4.5% |
| 12 | 33 | 4.1% |
| Other values (18) | 252 |
| Value | Count | Frequency (%) |
| 0 | 4 | 0.5% |
| 3 | 2 | 0.2% |
| 4 | 3 | 0.4% |
| 5 | 11 | |
| 6 | 3 | 0.4% |
| 7 | 16 | |
| 8 | 15 | |
| 9 | 26 | |
| 10 | 19 | |
| 11 | 27 |
| Value | Count | Frequency (%) |
| 29 | 4 | 0.5% |
| 28 | 6 | 0.8% |
| 27 | 4 | 0.5% |
| 26 | 12 | 1.5% |
| 25 | 24 | 3.0% |
| 24 | 31 | 3.9% |
| 23 | 59 | |
| 22 | 49 | |
| 21 | 52 | |
| 20 | 81 |
| Distinct | 24 |
|---|---|
| Distinct (%) | 3.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.55875 |
| Minimum | 0 |
|---|---|
| Maximum | 30 |
| Zeros | 311 |
| Zeros (%) | 38.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 44.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 3 |
| Q3 | 7 |
| 95-th percentile | 16 |
| Maximum | 30 |
| Range | 30 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 5.4491894 |
|---|---|
| Coefficient of variation (CV) | 1.1953253 |
| Kurtosis | 1.5741974 |
| Mean | 4.55875 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 1.345579 |
| Sum | 3647 |
| Variance | 29.693666 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 311 | |
| 5 | 53 | 6.6% |
| 2 | 51 | 6.4% |
| 4 | 51 | 6.4% |
| 7 | 40 | 5.0% |
| 9 | 38 | 4.8% |
| 6 | 36 | 4.5% |
| 3 | 34 | 4.2% |
| 1 | 30 | 3.8% |
| 8 | 27 | 3.4% |
| Other values (14) | 129 |
| Value | Count | Frequency (%) |
| 0 | 311 | |
| 1 | 30 | 3.8% |
| 2 | 51 | 6.4% |
| 3 | 34 | 4.2% |
| 4 | 51 | 6.4% |
| 5 | 53 | 6.6% |
| 6 | 36 | 4.5% |
| 7 | 40 | 5.0% |
| 8 | 27 | 3.4% |
| 9 | 38 | 4.8% |
| Value | Count | Frequency (%) |
| 30 | 1 | 0.1% |
| 27 | 3 | 0.4% |
| 21 | 4 | 0.5% |
| 20 | 4 | 0.5% |
| 19 | 9 | |
| 18 | 4 | 0.5% |
| 17 | 13 | |
| 16 | 7 | |
| 15 | 10 | |
| 14 | 12 |
Exercise
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 39.3 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 536 | |
| True | 264 |
Smoking_Habit
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 39.3 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 673 | |
| True | 127 | 15.9% |
Drinking_Habit
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 44.8 KiB |
| I usually consume alcohol every day | |
|---|---|
| I consider myself a social drinker | |
| I do not consume any type of alcohol | 11 |
Length
| Max length | 36 |
|---|---|
| Median length | 35 |
| Mean length | 34.535 |
| Min length | 34 |
Characters and Unicode
| Total characters | 27628 |
|---|---|
| Distinct characters | 21 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | I usually consume alcohol every day |
|---|---|
| 2nd row | I consider myself a social drinker |
| 3rd row | I consider myself a social drinker |
| 4th row | I usually consume alcohol every day |
| 5th row | I consider myself a social drinker |
Common Values
| Value | Count | Frequency (%) |
| I usually consume alcohol every day | 406 | |
| I consider myself a social drinker | 383 | |
| I do not consume any type of alcohol | 11 | 1.4% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| i | 800 | |
| consume | 417 | |
| alcohol | 417 | |
| usually | 406 | |
| every | 406 | |
| day | 406 | |
| consider | 383 | |
| myself | 383 | |
| a | 383 | |
| social | 383 | |
| Other values (6) | 438 |
Most occurring characters
| Value | Count | Frequency (%) |
| 4022 | ||
| l | 2412 | 8.7% |
| e | 2389 | 8.6% |
| o | 2050 | 7.4% |
| a | 2006 | 7.3% |
| s | 1972 | 7.1% |
| y | 1623 | 5.9% |
| c | 1600 | 5.8% |
| r | 1555 | 5.6% |
| u | 1229 | 4.4% |
| Other values (11) | 6770 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 22806 | |
| Space Separator | 4022 | 14.6% |
| Uppercase Letter | 800 | 2.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| l | 2412 | |
| e | 2389 | |
| o | 2050 | |
| a | 2006 | |
| s | 1972 | |
| y | 1623 | 7.1% |
| c | 1600 | 7.0% |
| r | 1555 | 6.8% |
| u | 1229 | 5.4% |
| n | 1205 | 5.3% |
| Other values (9) | 4765 |
Space Separator
| Value | Count | Frequency (%) |
| 4022 |
Uppercase Letter
| Value | Count | Frequency (%) |
| I | 800 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 23606 | |
| Common | 4022 | 14.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| l | 2412 | |
| e | 2389 | |
| o | 2050 | 8.7% |
| a | 2006 | 8.5% |
| s | 1972 | 8.4% |
| y | 1623 | 6.9% |
| c | 1600 | 6.8% |
| r | 1555 | 6.6% |
| u | 1229 | 5.2% |
| n | 1205 | 5.1% |
| Other values (10) | 5565 |
Common
| Value | Count | Frequency (%) |
| 4022 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 27628 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 4022 | ||
| l | 2412 | 8.7% |
| e | 2389 | 8.6% |
| o | 2050 | 7.4% |
| a | 2006 | 7.3% |
| s | 1972 | 7.1% |
| y | 1623 | 5.9% |
| c | 1600 | 5.8% |
| r | 1555 | 5.6% |
| u | 1229 | 4.4% |
| Other values (11) | 6770 |
Fruit_Habit
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 44.8 KiB |
| Less than 1. I do not consume fruits every day. | |
|---|---|
| 1 to 2 pieces of fruit in average | |
| 3 to 4 pieces of fruit in average | |
| 5 to 6 pieces of fruit in average | |
| More than six pieces of fruit | 12 |
Length
| Max length | 47 |
|---|---|
| Median length | 47 |
| Mean length | 40.85 |
| Min length | 29 |
Characters and Unicode
| Total characters | 32680 |
|---|---|
| Distinct characters | 30 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Less than 1. I do not consume fruits every day. |
|---|---|
| 2nd row | Less than 1. I do not consume fruits every day. |
| 3rd row | Less than 1. I do not consume fruits every day. |
| 4th row | Less than 1. I do not consume fruits every day. |
| 5th row | 1 to 2 pieces of fruit in average |
Common Values
| Value | Count | Frequency (%) |
| Less than 1. I do not consume fruits every day. | 452 | |
| 1 to 2 pieces of fruit in average | 175 | 21.9% |
| 3 to 4 pieces of fruit in average | 105 | 13.1% |
| 5 to 6 pieces of fruit in average | 56 | 7.0% |
| More than six pieces of fruit | 12 | 1.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 627 | 8.6% |
| than | 464 | 6.4% |
| less | 452 | 6.2% |
| i | 452 | 6.2% |
| do | 452 | 6.2% |
| not | 452 | 6.2% |
| consume | 452 | 6.2% |
| fruits | 452 | 6.2% |
| every | 452 | 6.2% |
| day | 452 | 6.2% |
| Other values (13) | 2573 |
Most occurring characters
| Value | Count | Frequency (%) |
| 6480 | ||
| e | 3188 | 9.8% |
| s | 2168 | 6.6% |
| o | 2052 | 6.3% |
| t | 2052 | 6.3% |
| n | 1704 | 5.2% |
| r | 1600 | 4.9% |
| a | 1588 | 4.9% |
| i | 1496 | 4.6% |
| u | 1252 | 3.8% |
| Other values (20) | 9100 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 23256 | |
| Space Separator | 6480 | 19.8% |
| Decimal Number | 1124 | 3.4% |
| Uppercase Letter | 916 | 2.8% |
| Other Punctuation | 904 | 2.8% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 3188 | |
| s | 2168 | |
| o | 2052 | |
| t | 2052 | |
| n | 1704 | 7.3% |
| r | 1600 | 6.9% |
| a | 1588 | 6.8% |
| i | 1496 | 6.4% |
| u | 1252 | 5.4% |
| f | 1148 | 4.9% |
| Other values (9) | 5008 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 627 | |
| 2 | 175 | 15.6% |
| 3 | 105 | 9.3% |
| 4 | 105 | 9.3% |
| 5 | 56 | 5.0% |
| 6 | 56 | 5.0% |
Uppercase Letter
| Value | Count | Frequency (%) |
| L | 452 | |
| I | 452 | |
| M | 12 | 1.3% |
Space Separator
| Value | Count | Frequency (%) |
| 6480 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 904 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 24172 | |
| Common | 8508 | 26.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 3188 | |
| s | 2168 | 9.0% |
| o | 2052 | 8.5% |
| t | 2052 | 8.5% |
| n | 1704 | 7.0% |
| r | 1600 | 6.6% |
| a | 1588 | 6.6% |
| i | 1496 | 6.2% |
| u | 1252 | 5.2% |
| f | 1148 | 4.7% |
| Other values (12) | 5924 |
Common
| Value | Count | Frequency (%) |
| 6480 | ||
| . | 904 | 10.6% |
| 1 | 627 | 7.4% |
| 2 | 175 | 2.1% |
| 3 | 105 | 1.2% |
| 4 | 105 | 1.2% |
| 5 | 56 | 0.7% |
| 6 | 56 | 0.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 32680 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 6480 | ||
| e | 3188 | 9.8% |
| s | 2168 | 6.6% |
| o | 2052 | 6.3% |
| t | 2052 | 6.3% |
| n | 1704 | 5.2% |
| r | 1600 | 4.9% |
| a | 1588 | 4.9% |
| i | 1496 | 4.6% |
| u | 1252 | 3.8% |
| Other values (20) | 9100 |
Water_Habit
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 44.8 KiB |
| Between one liter and two liters | |
|---|---|
| More than half a liter but less than one liter | |
| Less than half a liter |
Length
| Max length | 46 |
|---|---|
| Median length | 32 |
| Mean length | 37.11 |
| Min length | 22 |
Characters and Unicode
| Total characters | 29688 |
|---|---|
| Distinct characters | 19 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Between one liter and two liters |
|---|---|
| 2nd row | Between one liter and two liters |
| 3rd row | More than half a liter but less than one liter |
| 4th row | More than half a liter but less than one liter |
| 5th row | More than half a liter but less than one liter |
Common Values
| Value | Count | Frequency (%) |
| Between one liter and two liters | 364 | |
| More than half a liter but less than one liter | 352 | |
| Less than half a liter | 84 | 10.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| liter | 1152 | |
| than | 788 | |
| one | 716 | |
| half | 436 | 7.1% |
| a | 436 | 7.1% |
| less | 436 | 7.1% |
| between | 364 | 5.9% |
| and | 364 | 5.9% |
| two | 364 | 5.9% |
| liters | 364 | 5.9% |
| Other values (2) | 704 |
Most occurring characters
| Value | Count | Frequency (%) |
| 5324 | ||
| e | 4112 | |
| t | 3384 | |
| l | 2304 | |
| n | 2232 | |
| a | 2024 | 6.8% |
| r | 1868 | 6.3% |
| i | 1516 | 5.1% |
| o | 1432 | 4.8% |
| s | 1236 | 4.2% |
| Other values (9) | 4256 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 23564 | |
| Space Separator | 5324 | 17.9% |
| Uppercase Letter | 800 | 2.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 4112 | |
| t | 3384 | |
| l | 2304 | |
| n | 2232 | |
| a | 2024 | |
| r | 1868 | |
| i | 1516 | 6.4% |
| o | 1432 | 6.1% |
| s | 1236 | 5.2% |
| h | 1224 | 5.2% |
| Other values (5) | 2232 |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 364 | |
| M | 352 | |
| L | 84 | 10.5% |
Space Separator
| Value | Count | Frequency (%) |
| 5324 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 24364 | |
| Common | 5324 | 17.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 4112 | |
| t | 3384 | |
| l | 2304 | |
| n | 2232 | |
| a | 2024 | |
| r | 1868 | |
| i | 1516 | 6.2% |
| o | 1432 | 5.9% |
| s | 1236 | 5.1% |
| h | 1224 | 5.0% |
| Other values (8) | 3032 |
Common
| Value | Count | Frequency (%) |
| 5324 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 29688 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5324 | ||
| e | 4112 | |
| t | 3384 | |
| l | 2304 | |
| n | 2232 | |
| a | 2024 | 6.8% |
| r | 1868 | 6.3% |
| i | 1516 | 5.1% |
| o | 1432 | 4.8% |
| s | 1236 | 4.2% |
| Other values (9) | 4256 |
Disease
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 44.8 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 800 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 411 | |
| 0 | 389 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 411 | |
| 0 | 389 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 411 | |
| 0 | 389 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 800 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 411 | |
| 0 | 389 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 800 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 411 | |
| 0 | 389 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 800 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 411 | |
| 0 | 389 |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.| PatientID | Name | Birth_Year | Region | Education | Height | Weight | Checkup | Diabetes | High_Cholesterol | Blood_Pressure | Mental_Health | Physical_Health | Exercise | Smoking_Habit | Drinking_Habit | Fruit_Habit | Water_Habit | Disease | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1167 | Mrs. Stephanie Gay | 1965 | London | High School Incomplete (10th to 11th grade) | 155 | 67 | More than 3 years | Neither I nor my immediate family have diabetes. | 358 | 120 | 21 | 2 | Yes | No | I usually consume alcohol every day | Less than 1. I do not consume fruits every day. | Between one liter and two liters | 1 |
| 1 | 1805 | Mr. Sherman Nero | 1969 | South West | High School Incomplete (10th to 11th grade) | 173 | 88 | Not sure | Neither I nor my immediate family have diabetes. | 230 | 142 | 9 | 0 | Yes | No | I consider myself a social drinker | Less than 1. I do not consume fruits every day. | Between one liter and two liters | 1 |
| 2 | 1557 | Mr. Mark Boller | 1974 | Yorkshire and the Humber | Elementary School (1st to 9th grade) | 162 | 68 | More than 3 years | Neither I nor my immediate family have diabetes. | 226 | 122 | 26 | 0 | No | No | I consider myself a social drinker | Less than 1. I do not consume fruits every day. | More than half a liter but less than one liter | 1 |
| 3 | 1658 | Mr. David Caffee | 1958 | London | University Complete (3 or more years) | 180 | 66 | Not sure | I have/had pregnancy diabetes or borderline diabetes | 313 | 125 | 13 | 8 | Yes | No | I usually consume alcohol every day | Less than 1. I do not consume fruits every day. | More than half a liter but less than one liter | 0 |
| 4 | 1544 | Mr. Gerald Emery | 1968 | South East | University Incomplete (1 to 2 years) | 180 | 58 | More than 3 years | I have/had pregnancy diabetes or borderline diabetes | 277 | 125 | 18 | 2 | No | No | I consider myself a social drinker | 1 to 2 pieces of fruit in average | More than half a liter but less than one liter | 1 |
| 5 | 1653 | Mr. David Lamothe | 1966 | East Midlands | NaN | 167 | 49 | Not sure | Neither I nor my immediate family have diabetes. | 287 | 130 | 7 | 7 | Yes | Yes | I consider myself a social drinker | Less than 1. I do not consume fruits every day. | More than half a liter but less than one liter | 0 |
| 6 | 1422 | Mrs. Patricia Byrne | 1965 | Yorkshire and the Humber | High School Graduate | 158 | 63 | More than 3 years | Neither I nor my immediate family have diabetes. | 358 | 120 | 21 | 2 | Yes | No | I usually consume alcohol every day | Less than 1. I do not consume fruits every day. | Less than half a liter | 1 |
| 7 | 1806 | Mr. Wesley Shoemaker | 1965 | West Midlands | High School Graduate | 178 | 67 | Less than 3 years but more than 1 year | Neither I nor my immediate family have diabetes. | 280 | 150 | 9 | 2 | Yes | No | I consider myself a social drinker | 1 to 2 pieces of fruit in average | More than half a liter but less than one liter | 0 |
| 8 | 1703 | Mr. Billy Kirkland | 1965 | East of England | High School Graduate | 162 | 63 | Less than 3 years but more than 1 year | Neither I nor my immediate family have diabetes. | 205 | 110 | 12 | 7 | Yes | No | I usually consume alcohol every day | Less than 1. I do not consume fruits every day. | Between one liter and two liters | 1 |
| 9 | 1370 | Mrs. Tina Morris | 1979 | East Midlands | High School Graduate | 154 | 51 | Not sure | Neither I nor my immediate family have diabetes. | 345 | 132 | 14 | 14 | Yes | Yes | I consider myself a social drinker | Less than 1. I do not consume fruits every day. | Between one liter and two liters | 0 |
| PatientID | Name | Birth_Year | Region | Education | Height | Weight | Checkup | Diabetes | High_Cholesterol | Blood_Pressure | Mental_Health | Physical_Health | Exercise | Smoking_Habit | Drinking_Habit | Fruit_Habit | Water_Habit | Disease | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 790 | 1239 | Mrs. Melanie Pope | 1960 | East of England | University Complete (3 or more years) | 154 | 48 | More than 3 years | Neither I nor my immediate family have diabetes. | 248 | 150 | 19 | 6 | Yes | No | I usually consume alcohol every day | Less than 1. I do not consume fruits every day. | Between one liter and two liters | 0 |
| 791 | 1782 | Mrs. Kim Kling | 1958 | North West | University Complete (3 or more years) | 157 | 51 | More than 3 years | Neither I nor my immediate family have diabetes. | 307 | 130 | 11 | 9 | No | No | I usually consume alcohol every day | 3 to 4 pieces of fruit in average | More than half a liter but less than one liter | 1 |
| 792 | 1695 | Mr. Michael Thomas | 1987 | East Midlands | High School Graduate | 171 | 76 | Not sure | Neither I nor my immediate family have diabetes. | 286 | 126 | 19 | 0 | Yes | No | I consider myself a social drinker | Less than 1. I do not consume fruits every day. | More than half a liter but less than one liter | 0 |
| 793 | 1590 | Mrs. Crystal Comeaux | 1948 | London | University Complete (3 or more years) | 157 | 69 | More than 3 years | I do have diabetes | 273 | 120 | 11 | 0 | Yes | No | I consider myself a social drinker | 1 to 2 pieces of fruit in average | Less than half a liter | 1 |
| 794 | 1912 | Mr. Mike Jefferson | 1987 | Yorkshire and the Humber | High School Graduate | 173 | 74 | Not sure | Neither I nor my immediate family have diabetes. | 202 | 120 | 13 | 7 | Yes | No | I usually consume alcohol every day | Less than 1. I do not consume fruits every day. | Between one liter and two liters | 0 |
| 795 | 1909 | Mr. Philip Klink | 1972 | East Midlands | High School Incomplete (10th to 11th grade) | 178 | 61 | Not sure | Neither I nor my immediate family have diabetes. | 204 | 144 | 12 | 4 | Yes | No | I consider myself a social drinker | Less than 1. I do not consume fruits every day. | Between one liter and two liters | 0 |
| 796 | 1386 | Mrs. Jackie Valencia | 1980 | North West | Elementary School (1st to 9th grade) | 157 | 61 | More than 3 years | I have/had pregnancy diabetes or borderline diabetes | 213 | 120 | 23 | 0 | No | No | I usually consume alcohol every day | Less than 1. I do not consume fruits every day. | Between one liter and two liters | 1 |
| 797 | 1088 | Mrs. Cheryl Harris | 1860 | East Midlands | Elementary School (1st to 9th grade) | 167 | 48 | More than 3 years | Neither I nor my immediate family have diabetes. | 272 | 140 | 20 | 17 | No | No | I consider myself a social drinker | 3 to 4 pieces of fruit in average | More than half a liter but less than one liter | 0 |
| 798 | 1662 | Mr. Florencio Doherty | 1975 | East of England | Elementary School (1st to 9th grade) | 165 | 75 | More than 3 years | Neither I nor my immediate family have diabetes. | 208 | 112 | 16 | 0 | No | No | I usually consume alcohol every day | Less than 1. I do not consume fruits every day. | More than half a liter but less than one liter | 1 |
| 799 | 1117 | Mr. Freddie Vermillion | 1979 | London | Elementary School (1st to 9th grade) | 173 | 70 | Not sure | Neither I nor my immediate family have diabetes. | 181 | 120 | 11 | 12 | Yes | No | I consider myself a social drinker | Less than 1. I do not consume fruits every day. | Less than half a liter | 0 |